{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Attention"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why attention?\n",
    "\n",
    "Attention means keeping tabs on the most important parts. Attention comes from a key observation: Not all words are equal, and some words are more crucial to understanding the sentence than other. For example, the sentence \"It is raining outside\". You probably understand that it's raining outside if I say: \"Rain! Out!\". In this case, _it_ and _is_ are completely redundant. And if a model is trying to understand the sentence, throwing out _it_ and _is_ is probably not going to make a difference."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What's attention?\n",
    "\n",
    "So how to focus only on the most important part? One way to do it is to multiply the important parts by a large factor, while reducing the unimportant parts values (those parts are, in fact, numbers in machine's language). And that's what attention mechanism does."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Try attention in code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def softmax(x, t = 1):\n",
    "    exp = np.exp(x / t)\n",
    "\n",
    "    # sums over the last axis\n",
    "    sum_exp = exp.sum(-1, keepdims=True)\n",
    "    \n",
    "    return exp / sum_exp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "num = 5\n",
    "\n",
    "weights = softmax(np.random.randn(num), t=0.1)\n",
    "data = np.random.randn(num)\n",
    "\n",
    "print(weights)\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average = data.sum() / data.size\n",
    "attn_applied = weights @ data\n",
    "\n",
    "print(average)\n",
    "print(attn_applied)\n",
    "\n",
    "print(weights.argmax())\n",
    "print(data[weights.argmax()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See how the attention mask makes the weighted average of data closer to the desired place."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
